-
Notifications
You must be signed in to change notification settings - Fork 369
Cpu memory optimization #3845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Cpu memory optimization #3845
Conversation
de93114
to
15cf7eb
Compare
15cf7eb
to
66b40bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need the env variable control
) | ||
|
||
interpreter_result = interpreter.run() | ||
# Delete the frozen parameters from the module to release CPU memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gate this by the same env variable as the malloc_trim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like
export low_RAM_mode=1 python flux.py
or build env variable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say something like export TORCHTRT_USE_MALLOC_TRIM=1 python flux.py
and the goal would be for this to go away in 2.11 and be merged into prioritize_host_memory_consumption=True
|
||
# Move the weights in the state_dict to CPU | ||
if offload_module_to_cpu: | ||
deallocate_module(gm, delete_module=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arent these the same?
|
||
# Here we delete the frozen parameters from the graph module. Note this does not affect the submodules. We are going to delete the frozen parameters from the submodules in the convert_module function. | ||
# This is done to release CPU memory. | ||
for attr in dir(gm): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make this opt in similar to malloc trim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be cleared no matter what?
ad26f2c
to
92775f6
Compare
92f62f8
to
049b810
Compare
92775f6
to
880b639
Compare
09a4a3d
to
fddc075
Compare
self.tag(subgraphs) | ||
return self.split() | ||
|
||
def calculate_num_of_break(self, subgraphs: List[Subgraph]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is too much of an heuristic based system. A better approach IMO is to calculate a graph size budget based on available memory (or eventually this could be user specified). Then of the TRT blocks we estimate its size and then decide how many subgraphs it should be split into to meet the budget
for subgraph in subgraphs: | ||
if subgraph.is_acc: | ||
for i, node in enumerate(subgraph.nodes): | ||
if "scaled_dot" in str(node.target): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its fine if we do this for testing but really we should be taking a much more generic approach rather than assuming only sdpa is a viable break point.
Description
Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Type of change
Please delete options that are not relevant and/or add your own.
Checklist: